The Complexity of FFT and Related Butterfly Algorithms on Meshes and Hypermeshes

نویسنده

  • Ted H. Szymanski
چکیده

Parallel FFT data-flow graphs based on a Butterfly graph followed by a bit-reversal permutation are known, as are optimal-order embeddings of these flow-graphs onto meshes and hypercubes. Embeddings onto a 2D mesh require O(sqrtN) data transfer steps and O(logN) computation steps. Embeddings onto a hypercube require O(logN) data transfer steps and O(logN) computation steps. A similar FFT algorithm for the recently proposed ”hypermesh”, with O(logN) computation steps and O(logN) data transfer steps, is proposed. The performance complexity of the FFT algorithm on all three interconnection networks is then compared, based on the assumptions that (1) all networks are built with discrete crossbar switches interconnected with transmission lines, (2) all networks compared have equivalent aggregate bandwidth, and (3) the packet transmission time is inversely proportional to the link bandwidth. The algorithms are viewed at the ”Word-level” of abstraction, where every packet is treated as an indivisible unit. Under these assumptions, it is concluded that for practical network sizes the 2D hypermesh is faster than the 2D mesh and the binary hypercube by factors of O( √ N/logN) and O(logN) respectively. Considering the computation of a 4K sample FFT on 4K processor networks, the hypermesh is roughly a factor of 27 times faster than a 2D mesh and a factor of 10 time faster than a binary hypercube. Variations in the assumptions may affect the end results slightly; these conclusions may not hold when the network is implemented entirely on a single wafer, but this scenario is unlikely for the next decade or two. These complexity results indicate that the hypermesh is the preferred interconnection scheme in discrete component constructions of parallel supercomputers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ASIC Design of Butterfly Unit Based on Non-Redundant and Redundant Algorithm

Fast Fourier Transform (FFT) processors employed with pipeline architecture consist of series of Processing Elements (PE) or Butterfly Units (BU). BU or PE of FFT performs multiplication and addition on complex numbers. This paper proposes a single BU to compute radix-2, 8 point FFT in the time domain as well as frequency domain by replacing a series of PEs. This BU comprises of fused floating ...

متن کامل

Complex Multiplication Reduction in Fft Processors

The number of multiplications has been used as a key metrics for comparing FFT algorithms since it has a large impact on the execution time and total power consumption. In this paper, we present a 16-point FFT Butterfly PE, which reduces the multiplicative complexity by using real, constant multiplications. A 1024-point FFT processor has been implemented using 16-point and 4-point Butterfly PEs...

متن کامل

Design and Implementation of Floating-Point Butterfly Architecture Based on Multi-Operand Adders

In this paper we have here in the processor FFT and FFT butterfly structure, reading, writing and execution addresses. Fast Fourier Transform (FFT) coprocessor having a noticeable impact on the performance of communication systems, has been a hot topic of research for many years. FFT function over the complex numbers in a row, also known as butterfly units have to add and multiply. , FFT archit...

متن کامل

Implementation of FFT Butterfly Algorithm Using SMB Recoding Techniques

Arithmetic operations of high complexity are widely used in Digital Signal Processing (DSP) applications. The FFT algorithms use butterfly method in order to find the output. The Butterfly method includes an addition followed by a multiplication. In this work, we focus on optimizing the design of the fused Add-Multiply (FAM) operator for increasing performance and hence the FFT. Optimization of...

متن کامل

Improving the RX Anomaly Detection Algorithm for Hyperspectral Images using FFT

Anomaly Detection (AD) has recently become an important application of target detection in hyperspectral images. The Reed-Xialoi (RX) is the most widely used AD algorithm that suffers from “small sample size” problem. The best solution for this problem is to use Dimensionality Reduction (DR) techniques as a pre-processing step for RX detector. Using this method not only improves the detection p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992